Structural basis of dimerization of chemokine receptors CCR5 and CXCR4

G protein-coupled receptors (GPCRs) are prominent drug targets responsible for extracellular-to-intracellular signal transduction. GPCRs can form functional dimers that have been poorly characterized so far. Here, we show the dimerization mechanism of the chemokine receptors CCR5 and CXCR4 by means of an advanced free-energy technique named coarse-grained metadynamics. Our results reproduce binding events between the GPCRs occurring in the minute timescale, revealing a symmetric and an asymmetric dimeric structure for each of the three investigated systems, CCR5/CCR5, CXCR4/CXCR4, and CCR5/CXCR4. The transmembrane helices TM4-TM5 and TM6-TM7 are the preferred binding interfaces for CCR5 and CXCR4, respectively. The identified dimeric states differ in the access to the binding sites of the ligand and G protein, indicating that dimerization may represent a fine allosteric mechanism to regulate receptor activity. Our study offers structural basis for the design of ligands able to modulate the formation of CCR5 and CXCR4 dimers and in turn their activity, with therapeutic potential against HIV, cancer, and immune-inflammatory diseases.


S2
Supplementary   Table 3. Lipidic composition of the realistic model (also defined as the plasma membrane) of cellular membrane used in (a) Martini 2, (b) Martini 3, and (c) atomistic simulations. The parameters for some lipids are missing in the Martini 3 and atomistic force fields, therefore these were not included in the plasma membrane model.

Supplementary Discussions
The role of cholesterol in CCR5 and CXCR4 dimerization The membrane components including cholesterol molecules are known to be able to modulate GPCR activity and dimerization [4][5][6] . In order to investigate the role of cholesterol molecules in the formation of the CCR5 and CXCR4 dimer complexes, we computed cholesterol density maps around the receptors at the end of the investigated binding process.
We note that in our study all the molecules are fully free to diffuse in the membrane, including the cholesterol molecules that were equally distributed in the two layers of the membrane at the beginning of the simulations. Our results show a high density of cholesterol molecules in specific spots of the receptors both in the monomer and dimer states where the GPCRs residues form binding sites for cholesterol ( Supplementary Figures 15-19). Dimeric state -Interestingly, we found that when the protomers are in the dimeric state, they form novel binding sites for cholesterol, alternative to those described for the monomeric state. In detail, Supplementary Figures 16-18  Overall, our findings indicate that cholesterol contributes to the formation and stabilisation of the dimer structures, binding to specific pockets of the receptors. Some of these are peculiar to the receptor dimer structures, while others are also present in the monomer states.

Dimerization mechanism
CCR5 dimer -Three Lowest Energy Paths (LEPs) are found from the monomer to the dimer states of CCR5 (Fig. 5A, red, black, and purple solid lines). Two of them (black and purple) lead to the lowest energy dimer states A and B.
Interestingly, the LEP ending in A passes through the metastable state α, with the symmetric binding interface TM1-TM2-H8 (Fig. 5A) and it branches before reaching minimum A. This new path (red solid line) reaches another dimer state β, which has a higher energy value compared to A and B (-17.2 kcal/mol) and an asymmetric binding interface involving TM1 a -TM2 a -H8 a /TM3 b -TM4 b . The LEP ending in minimum B (Fig. 5A, purple solid line) does not pass through any metastable state, but instead it directly reaches the basin of this minimum.
CXCR4 dimer -A single LEP was identified for the CXCR4 homodimer that from the monomer state splits into two paths reaching the two lowest energy dimers A and B (Fig. 5B). One LEP passes through metastable state δ and then reaches A, whereas the other passes through metastable state ϵ before reaching B (black and purple solid lines, respectively). The metastable state δ shows a symmetric binding interface using TM5-TM6, while the metastable state ϵ is characterised by an asymmetric binding interface formed by TM4 a -TM5 a -TM6 a /TM1 b .

S28
CCR5-CXCR4 dimer -The CCR5-CXCR4 heterodimer has a single LEP connecting the monomer state and the dimers A and B (Fig. 5C). This path passes through the metastable state ζ with a binding interface formed by TM1 a -TM7 a -H8 a /TM5 b -TM6 b . From this state, the system can reach A and B following two separate paths (purple and black solid lines in Fig. 5C, respectively).
Finally, in order to energetically assess the CCR5 and CXCR4 X-ray dimer structures, a total of 480 µs of unbiased CG-MD calculations were performed using these structures as starting state (160 µs per system). We note that during such simulations, the systems only partially explore the free energy landscape (Supplementary Fig. 11). This is due to the slow receptor diffusion in the membrane, the many energy basins to visit and the barriers to cross, which make necessary the employment of enhanced sampling techniques like CG-MetaD for a thorough exploration of the free energy landscape.

Equilibration protocol in CG-MetaD calculations
All the systems were subjected to a multistep equilibration protocol, with the aim of reducing the unfavourable contacts between atoms, however preserving the original protein conformation. This protocol is important in CG simulations, as All the following CG-MD simulations were performed in the NPT ensemble at the temperature of 300K. The v-rescale thermostat 8 (coupling constant of 0.1 ps) was used and pressure was set to 1 bar with the semi-isotropic Berendsen barostat 9 (coupling constant of 5.0 ps). LINCS 10 algorithm was used to preserve bonds' lengths. The cut-off method was used to treat long-range electrostatic interactions with the cutoff distance set to 1.1 nm. Short-range repulsive and attractive dispersion interactions were simultaneously described by a Lennard-Jones potential, with a cutoff at 1.1 nm.

Equilibration protocol in atomistic calculations
The CG structures were reverted to atomistic structures that underwent a multistep equilibration protocol. The backmapping procedure from the CG to the atomistic structures might produce high-energy, unstable conformations that require an adequate preparation before running the production all-atom MD calculations. As a first step, a steepest descent minimisation calculation was performed excluding the non-bonded interactions within membrane and protein atoms 11 . This is done to avoid simulation issues in this stage due to forces generated by overlapping atoms. This step was followed by a steepest descent minimisation calculation that now includes the non-bonded interactions, followed by a series of short NVT MD simulations at increasing timesteps (from 0.2 to 2 fs). During the second minimisation and the MD simulations, restraints with a force constant of 1000 kJ mol -1 nm -2 were applied to protein backbone atoms and the phospholipid heads. The obtained system was then fully equilibrated with a second longer equilibration protocol schematised in the following table.

nm. A Coulomb and
Lennard-Jones potential, with a cut-off at 1.2 nm, were used to treat short-range electrostatic and dispersion interactions, respectively.

Setup of CG-MetaD calculations
Parameters for the CG-MetaD calculations were accurately chosen based on preliminary atomistic and CG MD simulations. In particular, the definition of the MetaD parameters requires particular attention since the CG force fields are less accurate than the atomistic ones, and some simulation parameters could be adequately adapted. As the first step, we performed a CG-MetaD run on the CCR5-CCR5 system, with tentative Gaussian width and height. In this run, we used a value of 0.05 nm for r and 0.1 rad for Ω as Gaussian width and 0.5 kJ/mol as Gaussian height with a pace of 500 steps. This simulation was run as a single walker. After 130 µs several binding and unbinding events were observed with the receptors' secondary structure preserved and the system able to cross even large free-energy barriers separating the diverse energy minima. Therefore, we maintained 0.5 kJ/mol as Gaussian height with a pace of 500 steps in the production runs. As regards Gaussian width, starting from the lowest energy minima states identified in the preliminary CG-MetaD simulation, we performed 10 µs unbiased CG-MD simulations collecting statistics on structural and energetic data in the free-energy wells. The evolution of the r and Ω CVs during such simulations was monitored, and the final values of the Gaussian width for the r and Ω CVs were chosen as half of the standard deviation computed from the values collected for the CVs in the energy minima (0.04 nm for r and 0.06 rad for Ω) 13 .

Lowest Energy Paths identification
The Lowest Energy Paths (LEPs) of receptor dimerisation were computed for each system using the data of the Binding Free Energy Surface (BFES) obtained from the CG-MetaD simulations. LEPs were calculated by subdividing the BFES into a bidimensional grid and, starting from one of the energy minima or metastable states, minimising the energy of each S30 forward step in the grid. A forward step is an increase in the value of protein-protein distance with respect to the previous one along the grid. The grid was obtained by discretising the bidimensional BFES into points spanning from the lowest values of distance and torsional angle sampled (distance 2 nm, torsion -π) to the highest one (distance 9 nm, torsion π) with a binning equal to one-third of the Gaussian width deposited during CG-MetaD calculations. Minimisation was performed in an iterative way using in-house tools, by repeating the calculation several times with different starting conditions and grid exploration parameters to reach a converged result. As starting conditions, we used different values of distance and torsional angle on the BFES taken from and around the energy basins of each metastable state/energy minimum. As exploration parameters, we employed different values of increase in the protein-protein distance for the forward step and the maximum amount of allowed change in the torsional angle. This approach allowed avoiding over or under-sampling of the grid. Over-sampling would lead to non-continuous LEPs, whereas under-sampling would result in sub-optimal LEPs due to insufficient minimisation of the energy along the path. bottom constraints were applied between the receptor and the ligand, the receptor and the Gα and Gβ subunits, and between the Gα and Gβ subunits using GROMACS "Pull Code" parameters. These constraints were modelled to mimic and preserve the quaternary structure reported in 7F1Q 18 . A CCR5 molecule in the CCR5 dimer and in the CCR5-CXCR4 heterodimer in the unbound state was replaced by aCCR5(G). The new systems were prepared and simulated using the procedure reported for the inactive dimers in the plasma membrane model. To ensure that the starting point of each simulation performed with aCCR5(G) was similar to the lowest energy dimeric structure identified from the CG-MetaD calculations, Target MD simulations (TMD) lasting 500 ns were performed using as reference the backbone beads of the minimum structures. In the case of minima B of both aCCR5(G)-CCR5 and aCCR5(G)-CXCR4 systems, TMD were unable to reproduce the quaternary structure of the inactive dimers due to steric clashes between the G protein and the S31 inactive protomer. Therefore, 500 ns of CV-based steered MD (SMD) calculations were performed to allow the systems to reach the closest CV values to those of energy basins B. Plumed 2.7.1 was employed to perform both TMD and SMD simulations 19 . These conformations were used as starting structures in the unbiased CG-MD simulations.
In case b), CG-MD simulations were performed using as starting poses the experimental dimeric structures PDB IDs Additional CG-MD and CG-MetaD calculations with realistic membrane and Martini 3 To further prove the robustness of our computational approach in sampling the protein-protein association process, the CCR5-CCR5 system was also simulated using the Martini 3 force field released during the review process of the present article 20,21 . We performed additional CG-MetaD simulations using the starting configuration employed for the original Martini 2 CG-MetaD calculations and the plasma membrane model reported in Supplementary Table 3B. Unbiased CG-MD calculations were also run to compare the sampling power of CG-MetaD to standard MD calculations.
CG-MetaD and CG-MD simulations were performed using the same simulation protocol and settings previously reported.
In both cases, 8 walkers/replicas were simulated for 4 μs (32 μs in total) using GROMACS 2020.6 17 . Supplementary Fig.   13 reports the difference in the capability of CG-MD and CG-MetaD to explore the dimerisation phase space.

In silico mutagenesis experiments
The CG structures of dimeric minima A and B identified for CCR5, CXCR4 and CCR5-CXCR4 were backmapped to atomistic level and simulated for 500 ns as described in the Methods section of the main text. Upon a conformational cluster analysis performed on these atomistic structures using the procedure previously described, the most populated centroids of each minimum were provided to the web servers MutaBind2 22 and mCSM-PPI2 23 Supplementary Tables 4 and 5, the relative binding free-energy difference ΔΔG was computed as the difference in free energy between the wild-type dimer and the mutated one.